home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-09-08 | 48.1 KB | 1,263 lines | [TEXT/EMAC] |
- Info file bison.info, produced by Makeinfo, -*- Text -*- from input
- file bison.texinfo.
-
- This file documents the Bison parser generator.
-
- Copyright (C) 1988, 1989, 1990 Free Software Foundation, Inc.
-
- Permission is granted to make and distribute verbatim copies of
- this manual provided the copyright notice and this permission notice
- are preserved on all copies.
-
- Permission is granted to copy and distribute modified versions of
- this manual under the conditions for verbatim copying, provided also
- that the sections entitled "GNU General Public License" and
- "Conditions for Using Bison" are included exactly as in the original,
- and provided that the entire resulting derived work is distributed
- under the terms of a permission notice identical to this one.
-
- Permission is granted to copy and distribute translations of this
- manual into another language, under the above conditions for modified
- versions, except that the sections entitled "GNU General Public
- License", "Conditions for Using Bison" and this permission notice may
- be included in translations approved by the Free Software Foundation
- instead of in the original English.
-
- File: bison.info, Node: Type Decl, Next: Expect Decl, Prev: Union Decl, Up: Declarations
-
- Nonterminal Symbols
- -------------------
-
- When you use `%union' to specify multiple value types, you must
- declare the value type of each nonterminal symbol for which values are
- used. This is done with a `%type' declaration, like this:
-
- %type <TYPE> NONTERMINAL...
-
- Here NONTERMINAL is the name of a nonterminal symbol, and TYPE is the
- name given in the `%union' to the alternative that you want (*note
- Union Decl::.). You can give any number of nonterminal symbols in the
- same `%type' declaration, if they have the same value type. Use
- spaces to separate the symbol names.
-
- File: bison.info, Node: Expect Decl, Next: Start Decl, Prev: Type Decl, Up: Declarations
-
- Suppressing Conflict Warnings
- -----------------------------
-
- Bison normally warns if there are any conflicts in the grammar
- (*note Shift/Reduce::.), but most real grammars have harmless
- shift/reduce conflicts which are resolved in a predictable way and
- would be difficult to eliminate. It is desirable to suppress the
- warning about these conflicts unless the number of conflicts changes.
- You can do this with the `%expect' declaration.
-
- The declaration looks like this:
-
- %expect N
-
- Here N is a decimal integer. The declaration says there should be
- no warning if there are N shift/reduce conflicts and no reduce/reduce
- conflicts. The usual warning is given if there are either more or
- fewer conflicts, or if there are any reduce/reduce conflicts.
-
- In general, using `%expect' involves these steps:
-
- * Compile your grammar without `%expect'. Use the `-v' option to
- get a verbose list of where the conflicts occur. Bison will also
- print the number of conflicts.
-
- * Check each of the conflicts to make sure that Bison's default
- resolution is what you really want. If not, rewrite the grammar
- and go back to the beginning.
-
- * Add an `%expect' declaration, copying the number N from the
- number which Bison printed.
-
- Now Bison will stop annoying you about the conflicts you have
- checked, but it will warn you again if changes in the grammer result
- in additional conflicts.
-
- File: bison.info, Node: Start Decl, Next: Pure Decl, Prev: Expect Decl, Up: Declarations
-
- The Start-Symbol
- ----------------
-
- Bison assumes by default that the start symbol for the grammar is
- the first nonterminal specified in the grammar specification section.
- The programmer may override this restriction with the `%start'
- declaration as follows:
-
- %start SYMBOL
-
- File: bison.info, Node: Pure Decl, Next: Decl Summary, Prev: Start Decl, Up: Declarations
-
- A Pure (Reentrant) Parser
- -------------------------
-
- A "reentrant" program is one which does not alter in the course of
- execution; in other words, it consists entirely of "pure" (read-only)
- code. Reentrancy is important whenever asynchronous execution is
- possible; for example, a nonreentrant program may not be safe to call
- from a signal handler. In systems with multiple threads of control, a
- nonreentrant program must be called only within interlocks.
-
- The Bison parser is not normally a reentrant program, because it
- uses statically allocated variables for communication with `yylex'.
- These variables include `yylval' and `yylloc'.
-
- The Bison declaration `%pure_parser' says that you want the parser
- to be reentrant. It looks like this:
-
- %pure_parser
-
- The effect is that the two communication variables become local
- variables in `yyparse', and a different calling convention is used for
- the lexical analyzer function `yylex'. *Note Pure Calling::, for the
- details of this. The variable `yynerrs' also becomes local in
- `yyparse' (*note Error Reporting::.). The convention for calling
- `yyparse' itself is unchanged.
-
- File: bison.info, Node: Decl Summary, Prev: Pure Decl, Up: Declarations
-
- Bison Declaration Summary
- -------------------------
-
- Here is a summary of all Bison declarations:
-
- `%union'
- Declare the collection of data types that semantic values may have
- (*note Union Decl::.).
-
- `%token'
- Declare a terminal symbol (token type name) with no precedence or
- associativity specified (*note Token Decl::.).
-
- `%right'
- Declare a terminal symbol (token type name) that is
- right-associative (*note Precedence Decl::.).
-
- `%left'
- Declare a terminal symbol (token type name) that is
- left-associative (*note Precedence Decl::.).
-
- `%nonassoc'
- Declare a terminal symbol (token type name) that is nonassociative
- (using it in a way that would be associative is a syntax error)
- (*note Precedence Decl::.).
-
- `%type'
- Declare the type of semantic values for a nonterminal symbol
- (*note Type Decl::.).
-
- `%start'
- Specify the grammar's start symbol (*note Start Decl::.).
-
- `%expect'
- Declare the expected number of shift-reduce conflicts (*note
- Expect Decl::.).
-
- `%pure_parser'
- Request a pure (reentrant) parser program (*note Pure Decl::.).
-
- File: bison.info, Node: Multiple Parsers, Prev: Declarations, Up: Grammar File
-
- Multiple Parsers in the Same Program
- ====================================
-
- Most programs that use Bison parse only one language and therefore
- contain only one Bison parser. But what if you want to parse more
- than one language with the same program? Then you need to avoid a
- name conflict between different definitions of `yyparse', `yylval',
- and so on.
-
- The easy way to do this is to use the option `-p PREFIX' (*note
- Invocation::.). This renames the interface functions and variables of
- the Bison parser to start with PREFIX instead of `yy'. You can use
- this to give each parser distinct names that do not conflict.
-
- The precise list of symbols renamed is `yyparse', `yylex',
- `yyerror', `yylval', `yychar' and `yydebug'. For example, if you use
- `-p c', the names become `cparse', `clex', and so on.
-
- *All the other variables and macros associated with Bison are not
- renamed.* These others are not global; there is no conflict if the same
- name is used in different parsers. For example, `YYSTYPE' is not
- renamed, but defining this in different ways in different parsers
- causes no trouble (*note Value Type::.).
-
- The `-p' option works by adding macro definitions to the beginning
- of the parser source file, defining `yyparse' as `PREFIXparse', and so
- on. This effectively substitutes one name for the other in the entire
- parser file.
-
- File: bison.info, Node: Interface, Next: Algorithm, Prev: Grammar File, Up: Top
-
- Parser C-Language Interface
- ***************************
-
- The Bison parser is actually a C function named `yyparse'. Here we
- describe the interface conventions of `yyparse' and the other
- functions that it needs to use.
-
- Keep in mind that the parser uses many C identifiers starting with
- `yy' and `YY' for internal purposes. If you use such an identifier
- (aside from those in this manual) in an action or in additional C code
- in the grammar file, you are likely to run into trouble.
-
- * Menu:
-
- * Parser Function:: How to call `yyparse' and what it returns.
- * Lexical:: You must supply a function `yylex' which reads tokens.
- * Error Reporting:: You must supply a function `yyerror'.
- * Action Features:: Special features for use in actions.
-
- File: bison.info, Node: Parser Function, Next: Lexical, Prev: Interface, Up: Interface
-
- The Parser Function `yyparse'
- =============================
-
- You call the function `yyparse' to cause parsing to occur. This
- function reads tokens, executes actions, and ultimately returns when it
- encounters end-of-input or an unrecoverable syntax error. You can also
- write an action which directs `yyparse' to return immediately without
- reading further.
-
- The value returned by `yyparse' is 0 if parsing was successful
- (return is due to end-of-input).
-
- The value is 1 if parsing failed (return is due to a syntax error).
-
- In an action, you can cause immediate return from `yyparse' by using
- these macros:
-
- `YYACCEPT'
- Return immediately with value 0 (to report success).
-
- `YYABORT'
- Return immediately with value 1 (to report failure).
-
- File: bison.info, Node: Lexical, Next: Error Reporting, Prev: Parser Function, Up: Interface
-
- The Lexical Analyzer Function `yylex'
- =====================================
-
- The "lexical analyzer" function, `yylex', recognizes tokens from
- the input stream and returns them to the parser. Bison does not create
- this function automatically; you must write it so that `yyparse' can
- call it. The function is sometimes referred to as a lexical scanner.
-
- In simple programs, `yylex' is often defined at the end of the Bison
- grammar file. If `yylex' is defined in a separate source file, you
- need to arrange for the token-type macro definitions to be available
- there. To do this, use the `-d' option when you run Bison, so that it
- will write these macro definitions into a separate header file
- `NAME.tab.h' which you can include in the other source files that need
- it. *Note Invocation::.
-
- * Menu:
-
- * Calling Convention:: How `yyparse' calls `yylex'.
- * Token Values:: How `yylex' must return the semantic value
- of the token it has read.
- * Token Positions:: How `yylex' must return the text position
- (line number, etc.) of the token, if the
- actions want that.
- * Pure Calling:: How the calling convention differs
- in a pure parser (*note Pure Decl::.).
-
- File: bison.info, Node: Calling Convention, Next: Token Values, Prev: Lexical, Up: Lexical
-
- Calling Convention for `yylex'
- ------------------------------
-
- The value that `yylex' returns must be the numeric code for the type
- of token it has just found, or 0 for end-of-input.
-
- When a token is referred to in the grammar rules by a name, that
- name in the parser file becomes a C macro whose definition is the
- proper numeric code for that token type. So `yylex' can use the name
- to indicate that type. *Note Symbols::.
-
- When a token is referred to in the grammar rules by a character
- literal, the numeric code for that character is also the code for the
- token type. So `yylex' can simply return that character code. The
- null character must not be used this way, because its code is zero and
- that is what signifies end-of-input.
-
- Here is an example showing these things:
-
- yylex ()
- {
- ...
- if (c == EOF) /* Detect end of file. */
- return 0;
- ...
- if (c == '+' || c == '-')
- return c; /* Assume token type for `+' is '+'. */
- ...
- return INT; /* Return the type of the token. */
- ...
- }
-
- This interface has been designed so that the output from the `lex'
- utility can be used without change as the definition of `yylex'.
-
- File: bison.info, Node: Token Values, Next: Token Positions, Prev: Calling Convention, Up: Lexical
-
- Semantic Values of Tokens
- -------------------------
-
- In an ordinary (nonreentrant) parser, the semantic value of the
- token must be stored into the global variable `yylval'. When you are
- using just one data type for semantic values, `yylval' has that type.
- Thus, if the type is `int' (the default), you might write this in
- `yylex':
-
- ...
- yylval = value; /* Put value onto Bison stack. */
- return INT; /* Return the type of the token. */
- ...
-
- When you are using multiple data types, `yylval''s type is a union
- made from the `%union' declaration (*note Union Decl::.). So when you
- store a token's value, you must use the proper member of the union.
- If the `%union' declaration looks like this:
-
- %union {
- int intval;
- double val;
- symrec *tptr;
- }
-
- then the code in `yylex' might look like this:
-
- ...
- yylval.intval = value; /* Put value onto Bison stack. */
- return INT; /* Return the type of the token. */
- ...
-
- File: bison.info, Node: Token Positions, Next: Pure Calling, Prev: Token Values, Up: Lexical
-
- Textual Positions of Tokens
- ---------------------------
-
- If you are using the `@N'-feature (*note Action Features::.) in
- actions to keep track of the textual locations of tokens and groupings,
- then you must provide this information in `yylex'. The function
- `yyparse' expects to find the textual location of a token just parsed
- in the global variable `yylloc'. So `yylex' must store the proper
- data in that variable. The value of `yylloc' is a structure and you
- need only initialize the members that are going to be used by the
- actions. The four members are called `first_line', `first_column',
- `last_line' and `last_column'. Note that the use of this feature
- makes the parser noticeably slower.
-
- The data type of `yylloc' has the name `YYLTYPE'.
-
- File: bison.info, Node: Pure Calling, Prev: Token Positions, Up: Lexical
-
- Calling for Pure Parsers
- ------------------------
-
- When you use the Bison declaration `%pure_parser' to request a pure,
- reentrant parser, the global communication variables `yylval' and
- `yylloc' cannot be used. (*Note Pure Decl::.) In such parsers the
- two global variables are replaced by pointers passed as arguments to
- `yylex'. You must declare them as shown here, and pass the
- information back by storing it through those pointers.
-
- yylex (lvalp, llocp)
- YYSTYPE *lvalp;
- YYLTYPE *llocp;
- {
- ...
- *lvalp = value; /* Put value onto Bison stack. */
- return INT; /* Return the type of the token. */
- ...
- }
-
- If the grammar file does not use the `@' constructs to refer to
- textual positions, then the type `YYLTYPE' will not be defined. In
- this case, omit the second argument; `yylex' will be called with only
- one argument.
-
- File: bison.info, Node: Error Reporting, Next: Action Features, Prev: Lexical, Up: Interface
-
- The Error Reporting Function `yyerror'
- ======================================
-
- The Bison parser detects a "parse error" or "syntax error" whenever
- it reads a token which cannot satisfy any syntax rule. A action in
- the grammar can also explicitly proclaim an error, using the macro
- `YYERROR' (*note Action Features::.).
-
- The Bison parser expects to report the error by calling an error
- reporting function named `yyerror', which you must supply. It is
- called by `yyparse' whenever a syntax error is found, and it receives
- one argument. For a parse error, the string is always `"parse error"'.
-
- The parser can detect one other kind of error: stack overflow. This
- happens when the input contains constructions that are very deeply
- nested. It isn't likely you will encounter this, since the Bison
- parser extends its stack automatically up to a very large limit. But
- if overflow happens, `yyparse' calls `yyerror' in the usual fashion,
- except that the argument string is `"parser stack overflow"'.
-
- The following definition suffices in simple programs:
-
- yyerror (s)
- char *s;
- {
-
- fprintf (stderr, "%s\n", s);
- }
-
- After `yyerror' returns to `yyparse', the latter will attempt error
- recovery if you have written suitable error recovery grammar rules
- (*note Error Recovery::.). If recovery is impossible, `yyparse' will
- immediately return 1.
-
- The variable `yynerrs' contains the number of syntax errors
- encountered so far. Normally this variable is global; but if you
- request a pure parser (*note Pure Decl::.) then it is a local variable
- which only the actions can access.
-
- File: bison.info, Node: Action Features, Prev: Error Reporting, Up: Interface
-
- Special Features for Use in Actions
- ===================================
-
- Here is a table of Bison constructs, variables and macros that are
- useful in actions.
-
- `$$'
- Acts like a variable that contains the semantic value for the
- grouping made by the current rule. *Note Actions::.
-
- `$N'
- Acts like a variable that contains the semantic value for the Nth
- component of the current rule. *Note Actions::.
-
- `$<TYPEALT>$'
- Like `$$' but specifies alternative TYPEALT in the union
- specified by the `%union' declaration. *Note Action Types::.
-
- `$<TYPEALT>N'
- Like `$N' but specifies alternative TYPEALT in the union
- specified by the `%union' declaration. *Note Action Types::.
-
- `YYABORT;'
- Return immediately from `yyparse', indicating failure. *Note
- Parser Function::.
-
- `YYACCEPT;'
- Return immediately from `yyparse', indicating success. *Note
- Parser Function::.
-
- `YYBACKUP (TOKEN, VALUE);'
- Unshift a token. This macro is allowed only for rules that reduce
- a single value, and only when there is no look-ahead token. It
- installs a look-ahead token with token type TOKEN and semantic
- value VALUE; then it discards the value that was going to be
- reduced by this rule.
-
- If the macro is used when it is not valid, such as when there is
- a look-ahead token already, then it reports a syntax error with a
- message `cannot back up' and performs ordinary error recovery.
-
- In either case, the rest of the action is not executed.
-
- `YYEMPTY'
- Value stored in `yychar' when there is no look-ahead token.
-
- `YYERROR;'
- Cause an immediate syntax error. This statement initiates error
- recovery just as if the parser itself had detected an error;
- however, it does not call `yyerror', and does not print any
- message. If you want to print an error message, call `yyerror'
- explicitly before the `YYERROR;' statement. *Note Error
- Recovery::.
-
- `YYRECOVERING'
- This macro stands for an expression that has the value 1 when the
- parser is recovering from a syntax error, and 0 the rest of the
- time. *Note Error Recovery::.
-
- `yychar'
- Variable containing the current look-ahead token. (In a pure
- parser, this is actually a local variable within `yyparse'.)
- When there is no look-ahead token, the value `YYEMPTY' is stored
- in the variable. *Note Look-Ahead::.
-
- `yyclearin;'
- Discard the current look-ahead token. This is useful primarily in
- error rules. *Note Error Recovery::.
-
- `yyerrok;'
- Resume generating error messages immediately for subsequent syntax
- errors. This is useful primarily in error rules. *Note Error
- Recovery::.
-
- `@N'
- Acts like a structure variable containing information on the line
- numbers and column numbers of the Nth component of the current
- rule. The structure has four members, like this:
-
- struct {
- int first_line, last_line;
- int first_column, last_column;
- };
-
- Thus, to get the starting line number of the third component, use
- `@3.first_line'.
-
- In order for the members of this structure to contain valid
- information, you must make `yylex' supply this information about
- each token. If you need only certain members, then `yylex' need
- only fill in those members.
-
- The use of this feature makes the parser noticeably slower.
-
- File: bison.info, Node: Algorithm, Next: Error Recovery, Prev: Interface, Up: Top
-
- The Bison Parser Algorithm
- **************************
-
- As Bison reads tokens, it pushes them onto a stack along with their
- semantic values. The stack is called the "parser stack". Pushing a
- token is traditionally called "shifting".
-
- For example, suppose the infix calculator has read `1 + 5 *', with a
- `3' to come. The stack will have four elements, one for each token
- that was shifted.
-
- But the stack does not always have an element for each token read.
- When the last N tokens and groupings shifted match the components of a
- grammar rule, they can be combined according to that rule. This is
- called "reduction". Those tokens and groupings are replaced on the
- stack by a single grouping whose symbol is the result (left hand side)
- of that rule. Running the rule's action is part of the process of
- reduction, because this is what computes the semantic value of the
- resulting grouping.
-
- For example, if the infix calculator's parser stack contains this:
-
- 1 + 5 * 3
-
- and the next input token is a newline character, then the last three
- elements can be reduced to 15 via the rule:
-
- expr: expr '*' expr;
-
- Then the stack contains just these three elements:
-
- 1 + 15
-
- At this point, another reduction can be made, resulting in the single
- value 16. Then the newline token can be shifted.
-
- The parser tries, by shifts and reductions, to reduce the entire
- input down to a single grouping whose symbol is the grammar's
- start-symbol (*note Language and Grammar::.).
-
- This kind of parser is known in the literature as a bottom-up
- parser.
-
- * Menu:
-
- * Look-Ahead:: Parser looks one token ahead when deciding what to do.
- * Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
- * Precedence:: Operator precedence works by resolving conflicts.
- * Contextual Precedence:: When an operator's precedence depends on context.
- * Parser States:: The parser is a finite-state-machine with stack.
- * Reduce/Reduce:: When two rules are applicable in the same situation.
- * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
- * Stack Overflow:: What happens when stack gets full. How to avoid it.
-
- File: bison.info, Node: Look-Ahead, Next: Shift/Reduce, Prev: Algorithm, Up: Algorithm
-
- Look-Ahead Tokens
- =================
-
- The Bison parser does *not* always reduce immediately as soon as the
- last N tokens and groupings match a rule. This is because such a
- simple strategy is inadequate to handle most languages. Instead, when
- a reduction is possible, the parser sometimes "looks ahead" at the next
- token in order to decide what to do.
-
- When a token is read, it is not immediately shifted; first it
- becomes the "look-ahead token", which is not on the stack. Now the
- parser can perform one or more reductions of tokens and groupings on
- the stack, while the look-ahead token remains off to the side. When
- no more reductions should take place, the look-ahead token is shifted
- onto the stack. This does not mean that all possible reductions have
- been done; depending on the token type of the look-ahead token, some
- rules may choose to delay their application.
-
- Here is a simple case where look-ahead is needed. These three
- rules define expressions which contain binary addition operators and
- postfix unary factorial operators (`!'), and allow parentheses for
- grouping.
-
- expr: term '+' expr
- | term
- ;
-
-
- term: '(' expr ')'
- | term '!'
- | NUMBER
- ;
-
- Suppose that the tokens `1 + 2' have been read and shifted; what
- should be done? If the following token is `)', then the first three
- tokens must be reduced to form an `expr'. This is the only valid
- course, because shifting the `)' would produce a sequence of symbols
- `term ')'', and no rule allows this.
-
- If the following token is `!', then it must be shifted immediately
- so that `2 !' can be reduced to make a `term'. If instead the parser
- were to reduce before shifting, `1 + 2' would become an `expr'. It
- would then be impossible to shift the `!' because doing so would
- produce on the stack the sequence of symbols `expr '!''. No rule
- allows that sequence.
-
- The current look-ahead token is stored in the variable `yychar'.
- *Note Action Features::.
-
- File: bison.info, Node: Shift/Reduce, Next: Precedence, Prev: Look-Ahead, Up: Algorithm
-
- Shift/Reduce Conflicts
- ======================
-
- Suppose we are parsing a language which has if-then and if-then-else
- statements, with a pair of rules like this:
-
- if_stmt:
- IF expr THEN stmt
- | IF expr THEN stmt ELSE stmt
- ;
-
- (Here we assume that `IF', `THEN' and `ELSE' are terminal symbols for
- specific keyword tokens.)
-
- When the `ELSE' token is read and becomes the look-ahead token, the
- contents of the stack (assuming the input is valid) are just right for
- reduction by the first rule. But it is also legitimate to shift the
- `ELSE', because that would lead to eventual reduction by the second
- rule.
-
- This situation, where either a shift or a reduction would be valid,
- is called a "shift/reduce conflict". Bison is designed to resolve
- these conflicts by choosing to shift, unless otherwise directed by
- operator precedence declarations. To see the reason for this, let's
- contrast it with the other alternative.
-
- Since the parser prefers to shift the `ELSE', the result is to
- attach the else-clause to the innermost if-statement, making these two
- inputs equivalent:
-
- if x then if y then win (); else lose;
-
- if x then do; if y then win (); else lose; end;
-
- But if the parser chose to reduce when possible rather than shift,
- the result would be to attach the else-clause to the outermost
- if-statement, making these two inputs equivalent:
-
- if x then if y then win (); else lose;
-
- if x then do; if y then win (); end; else lose;
-
- The conflict exists because the grammar as written is ambiguous:
- either parsing of the simple nested if-statement is legitimate. The
- established convention is that these ambiguities are resolved by
- attaching the else-clause to the innermost if-statement; this is what
- Bison accomplishes by choosing to shift rather than reduce. (It would
- ideally be cleaner to write an unambiguous grammar, but that is very
- hard to do in this case.) This particular ambiguity was first
- encountered in the specifications of Algol 60 and is called the
- "dangling `else'" ambiguity.
-
- To avoid warnings from Bison about predictable, legitimate
- shift/reduce conflicts, use the `%expect N' declaration. There will
- be no warning as long as the number of shift/reduce conflicts is
- exactly N. *Note Expect Decl::.
-
- File: bison.info, Node: Precedence, Next: Contextual Precedence, Prev: Shift/Reduce, Up: Algorithm
-
- Operator Precedence
- ===================
-
- Another situation where shift/reduce conflicts appear is in
- arithmetic expressions. Here shifting is not always the preferred
- resolution; the Bison declarations for operator precedence allow you
- to specify when to shift and when to reduce.
-
- * Menu:
-
- * Why Precedence:: An example showing why precedence is needed.
- * Using Precedence:: How to specify precedence in Bison grammars.
- * Precedence Examples:: How these features are used in the previous example.
- * How Precedence:: How they work.
-
- File: bison.info, Node: Why Precedence, Next: Using Precedence, Prev: Precedence, Up: Precedence
-
- When Precedence is Needed
- -------------------------
-
- Consider the following ambiguous grammar fragment (ambiguous
- because the input `1 - 2 * 3' can be parsed in two different ways):
-
- expr: expr '-' expr
- | expr '*' expr
- | expr '<' expr
- | '(' expr ')'
- ...
- ;
-
- Suppose the parser has seen the tokens `1', `-' and `2'; should it
- reduce them via the rule for the addition operator? It depends on the
- next token. Of course, if the next token is `)', we must reduce;
- shifting is invalid because no single rule can reduce the token
- sequence `- 2 )' or anything starting with that. But if the next
- token is `*' or `<', we have a choice: either shifting or reduction
- would allow the parse to complete, but with different results.
-
- To decide which one Bison should do, we must consider the results.
- If the next operator token OP is shifted, then it must be reduced
- first in order to permit another opportunity to reduce the sum. The
- result is (in effect) `1 - (2 OP 3)'. On the other hand, if the
- subtraction is reduced before shifting OP, the result is
- `(1 - 2) OP 3'. Clearly, then, the choice of shift or reduce should
- depend on the relative precedence of the operators `-' and OP: `*'
- should be shifted first, but not `<'.
-
- What about input such as `1 - 2 - 5'; should this be `(1 - 2) - 5'
- or should it be `1 - (2 - 5)'? For most operators we prefer the
- former, which is called "left association". The latter alternative,
- "right association", is desirable for assignment operators. The
- choice of left or right association is a matter of whether the parser
- chooses to shift or reduce when the stack contains `1 - 2' and the
- look-ahead token is `-': shifting makes right-associativity.
-
- File: bison.info, Node: Using Precedence, Next: Precedence Examples, Prev: Why Precedence, Up: Precedence
-
- Specifying Operator Precedence
- ------------------------------
-
- Bison allows you to specify these choices with the operator
- precedence declarations `%left' and `%right'. Each such declaration
- contains a list of tokens, which are operators whose precedence and
- associativity is being declared. The `%left' declaration makes all
- those operators left-associative and the `%right' declaration makes
- them right-associative. A third alternative is `%nonassoc', which
- declares that it is a syntax error to find the same operator twice "in
- a row".
-
- The relative precedence of different operators is controlled by the
- order in which they are declared. The first `%left' or `%right'
- declaration in the file declares the operators whose precedence is
- lowest, the next such declaration declares the operators whose
- precedence is a little higher, and so on.
-
- File: bison.info, Node: Precedence Examples, Next: How Precedence, Prev: Using Precedence, Up: Precedence
-
- Precedence Examples
- -------------------
-
- In our example, we would want the following declarations:
-
- %left '<'
- %left '-'
- %left '*'
-
- In a more complete example, which supports other operators as well,
- we would declare them in groups of equal precedence. For example,
- `'+'' is declared with `'-'':
-
- %left '<' '>' '=' NE LE GE
- %left '+' '-'
- %left '*' '/'
-
- (Here `NE' and so on stand for the operators for "not equal" and so
- on. We assume that these tokens are more than one character long and
- therefore are represented by names, not character literals.)
-
- File: bison.info, Node: How Precedence, Prev: Precedence Examples, Up: Precedence
-
- How Precedence Works
- --------------------
-
- The first effect of the precedence declarations is to assign
- precedence levels to the terminal symbols declared. The second effect
- is to assign precedence levels to certain rules: each rule gets its
- precedence from the last terminal symbol mentioned in the components.
- (You can also specify explicitly the precedence of a rule. *Note
- Contextual Precedence::.)
-
- Finally, the resolution of conflicts works by comparing the
- precedence of the rule being considered with that of the look-ahead
- token. If the token's precedence is higher, the choice is to shift.
- If the rule's precedence is higher, the choice is to reduce. If they
- have equal precedence, the choice is made based on the associativity
- of that precedence level. The verbose output file made by `-v' (*note
- Invocation::.) says how each conflict was resolved.
-
- Not all rules and not all tokens have precedence. If either the
- rule or the look-ahead token has no precedence, then the default is to
- shift.
-
- File: bison.info, Node: Contextual Precedence, Next: Parser States, Prev: Precedence, Up: Algorithm
-
- Context-Dependent Precedence
- ============================
-
- Often the precedence of an operator depends on the context. This
- sounds outlandish at first, but it is really very common. For
- example, a minus sign typically has a very high precedence as a unary
- operator, and a somewhat lower precedence (lower than multiplication)
- as a binary operator.
-
- The Bison precedence declarations, `%left', `%right' and
- `%nonassoc', can only be used once for a given token; so a token has
- only one precedence declared in this way. For context-dependent
- precedence, you need to use an additional mechanism: the `%prec'
- modifier for rules.
-
- The `%prec' modifier declares the precedence of a particular rule by
- specifying a terminal symbol whose precedence should be used for that
- rule. It's not necessary for that symbol to appear otherwise in the
- rule. The modifier's syntax is:
-
- %prec TERMINAL-SYMBOL
-
- and it is written after the components of the rule. Its effect is to
- assign the rule the precedence of TERMINAL-SYMBOL, overriding the
- precedence that would be deduced for it in the ordinary way. The
- altered rule precedence then affects how conflicts involving that rule
- are resolved (*note Precedence::.).
-
- Here is how `%prec' solves the problem of unary minus. First,
- declare a precedence for a fictitious terminal symbol named `UMINUS'.
- There are no tokens of this type, but the symbol serves to stand for
- its precedence:
-
- ...
- %left '+' '-'
- %left '*'
- %left UMINUS
-
- Now the precedence of `UMINUS' can be used in specific rules:
-
- exp: ...
- | exp '-' exp
- ...
- | '-' exp %prec UMINUS
-
- File: bison.info, Node: Parser States, Next: Reduce/Reduce, Prev: Contextual Precedence, Up: Algorithm
-
- Parser States
- =============
-
- The function `yyparse' is implemented using a finite-state machine.
- The values pushed on the parser stack are not simply token type codes;
- they represent the entire sequence of terminal and nonterminal symbols
- at or near the top of the stack. The current state collects all the
- information about previous input which is relevant to deciding what to
- do next.
-
- Each time a look-ahead token is read, the current parser state
- together with the type of look-ahead token are looked up in a table.
- This table entry can say, "Shift the look-ahead token." In this case,
- it also specifies the new parser state, which is pushed onto the top
- of the parser stack. Or it can say, "Reduce using rule number N."
- This means that a certain of tokens or groupings are taken off the top
- of the stack, and replaced by one grouping. In other words, that
- number of states are popped from the stack, and one new state is
- pushed.
-
- There is one other alternative: the table can say that the
- look-ahead token is erroneous in the current state. This causes error
- processing to begin (*note Error Recovery::.).
-
- File: bison.info, Node: Reduce/Reduce, Next: Mystery Conflicts, Prev: Parser States, Up: Algorithm
-
- Reduce/Reduce Conflicts
- =======================
-
- A reduce/reduce conflict occurs if there are two or more rules that
- apply to the same sequence of input. This usually indicates a serious
- error in the grammar.
-
- For example, here is an erroneous attempt to define a sequence of
- zero or more `word' groupings.
-
- sequence: /* empty */
- { printf ("empty sequence\n"); }
- | word
- { printf ("single word %s\n", $1); }
- | sequence word
- { printf ("added word %s\n", $2); }
- ;
-
- The error is an ambiguity: there is more than one way to parse a single
- `word' into a `sequence'. It could be reduced directly via the second
- rule. Alternatively, nothing-at-all could be reduced into a
- `sequence' via the first rule, and this could be combined with the
- `word' using the third rule.
-
- You might think that this is a distinction without a difference,
- because it does not change whether any particular input is valid or
- not. But it does affect which actions are run. One parsing order
- runs the second rule's action; the other runs the first rule's action
- and the third rule's action. In this example, the output of the
- program changes.
-
- Bison resolves a reduce/reduce conflict by choosing to use the rule
- that appears first in the grammar, but it is very risky to rely on
- this. Every reduce/reduce conflict must be studied and usually
- eliminated. Here is the proper way to define `sequence':
-
- sequence: /* empty */
- { printf ("empty sequence\n"); }
- | sequence word
- { printf ("added word %s\n", $2); }
- ;
-
- Here is another common error that yields a reduce/reduce conflict:
-
- sequence: /* empty */
- | sequence words
- | sequence redirects
- ;
-
- words: /* empty */
- | words word
- ;
-
- redirects:/* empty */
- | redirects redirect
- ;
-
- The intention here is to define a sequence which can contain either
- `word' or `redirect' groupings. The individual definitions of
- `sequence', `words' and `redirects' are error-free, but the three
- together make a subtle ambiguity: even an empty input can be parsed in
- infinitely many ways!
-
- Consider: nothing-at-all could be a `words'. Or it could be two
- `words' in a row, or three, or any number. It could equally well be a
- `redirects', or two, or any number. Or it could be a `words' followed
- by three `redirects' and another `words'. And so on.
-
- Here are two ways to correct these rules. First, to make it a
- single level of sequence:
-
- sequence: /* empty */
- | sequence word
- | sequence redirect
- ;
-
- Second, to prevent either a `words' or a `redirects' from being
- empty:
-
- sequence: /* empty */
- | sequence words
- | sequence redirects
- ;
-
- words: word
- | words word
- ;
-
- redirects:redirect
- | redirects redirect
- ;
-
- File: bison.info, Node: Mystery Conflicts, Next: Stack Overflow, Prev: Reduce/Reduce, Up: Algorithm
-
- Mysterious Reduce/Reduce Conflicts
- ==================================
-
- Sometimes reduce/reduce conflicts can occur that don't look
- warranted. Here is an example:
-
- %token ID
-
- %%
- def: param_spec return_spec ','
- ;
- param_spec:
- type
- | name_list ':' type
- ;
- return_spec:
- type
- | name ':' type
- ;
- type: ID
- ;
- name: ID
- ;
- name_list:
- name
- | name ',' name_list
- ;
-
- It would seem that this grammar can be parsed with only a single
- token of look-ahead: when a `param_spec' is being read, an `ID' is a
- `name' if a comma or colon follows, or a `type' if another `ID'
- follows. In other words, this grammar is LR(1).
-
- However, Bison, like most parser generators, cannot actually handle
- all LR(1) grammars. In this grammar, two contexts, that after an `ID'
- at the beginning of a `param_spec' and likewise at the beginning of a
- `return_spec', are similar enough that Bison assumes they are the
- same. They appear similar because the same set of rules would be
- active--the rule for reducing to a `name' and that for reducing to a
- `type'. Bison is unable to determine at that stage of processing that
- the rules would require different look-ahead tokens in the two
- contexts, so it makes a single parser state for them both. Combining
- the two contexts causes a conflict later. In parser terminology, this
- occurrence means that the grammar is not LALR(1).
-
- In general, it is better to fix deficiencies than to document them.
- But this particular deficiency is intrinsically hard to fix; parser
- generators that can handle LR(1) grammars are hard to write and tend to
- produce parsers that are very large. In practice, Bison is more useful
- as it is now.
-
- When the problem arises, you can often fix it by identifying the two
- parser states that are being confused, and adding something to make
- them look distinct. In the above example, adding one rule to
- `return_spec' as follows makes the problem go away:
-
- %token BOGUS
- ...
- %%
- ...
- return_spec:
- type
- | name ':' type
- /* This rule is never used. */
- | ID BOGUS
- ;
-
- This corrects the problem because it introduces the possibility of
- an additional active rule in the context after the `ID' at the
- beginning of `return_spec'. This rule is not active in the
- corresponding context in a `param_spec', so the two contexts receive
- distinct parser states. As long as the token `BOGUS' is never
- generated by `yylex', the added rule cannot alter the way actual input
- is parsed.
-
- In this particular example, there is another way to solve the
- problem: rewrite the rule for `return_spec' to use `ID' directly
- instead of via `name'. This also causes the two confusing contexts to
- have different sets of active rules, because the one for `return_spec'
- activates the altered rule for `return_spec' rather than the one for
- `name'.
-
- param_spec:
- type
- | name_list ':' type
- ;
- return_spec:
- type
- | ID ':' type
- ;
-
- File: bison.info, Node: Stack Overflow, Prev: Mystery Conflicts, Up: Algorithm
-
- Stack Overflow, and How to Avoid It
- ===================================
-
- The Bison parser stack can overflow if too many tokens are shifted
- and not reduced. When this happens, the parser function `yyparse'
- returns a nonzero value, pausing only to call `yyerror' to report the
- overflow.
-
- By defining the macro `YYMAXDEPTH', you can control how deep the
- parser stack can become before a stack overflow occurs. Define the
- macro with a value that is an integer. This value is the maximum
- number of tokens that can be shifted (and not reduced) before overflow.
- It must be a constant expression whose value is known at compile time.
-
- The stack space allowed is not necessarily allocated. If you
- specify a large value for `YYMAXDEPTH', the parser actually allocates
- a small stack at first, and then makes it bigger by stages as needed.
- This increasing allocation happens automatically and silently.
- Therefore, you do not need to make `YYMAXDEPTH' painfully small merely
- to save space for ordinary inputs that do not need much stack.
-
- The default value of `YYMAXDEPTH', if you do not define it, is
- 10000.
-
- You can control how much stack is allocated initially by defining
- the macro `YYINITDEPTH'. This value too must be a compile-time
- constant integer. The default is 200.
-
- File: bison.info, Node: Error Recovery, Next: Context Dependency, Prev: Algorithm, Up: Top
-
- Error Recovery
- **************
-
- It is not usually acceptable to have a program terminate on a parse
- error. For example, a compiler should recover sufficiently to parse
- the rest of the input file and check it for errors; a calculator
- should accept another expression.
-
- In a simple interactive command parser where each input is one
- line, it may be sufficient to allow `yyparse' to return 1 on error and
- have the caller ignore the rest of the input line when that happens
- (and then call `yyparse' again). But this is inadequate for a
- compiler, because it forgets all the syntactic context leading up to
- the error. A syntax error deep within a function in the compiler
- input should not cause the compiler to treat the following line like
- the beginning of a source file.
-
- You can define how to recover from a syntax error by writing rules
- to recognize the special token `error'. This is a terminal symbol that
- is always defined (you need not declare it) and reserved for error
- handling. The Bison parser generates an `error' token whenever a
- syntax error happens; if you have provided a rule to recognize this
- token in the current context, the parse can continue.
-
- For example:
-
- stmnts: /* empty string */
- | stmnts '\n'
- | stmnts exp '\n'
- | stmnts error '\n'
-
- The fourth rule in this example says that an error followed by a
- newline makes a valid addition to any `stmnts'.
-
- What happens if a syntax error occurs in the middle of an `exp'?
- The error recovery rule, interpreted strictly, applies to the precise
- sequence of a `stmnts', an `error' and a newline. If an error occurs
- in the middle of an `exp', there will probably be some additional
- tokens and subexpressions on the stack after the last `stmnts', and
- there will be tokens to read before the next newline. So the rule is
- not applicable in the ordinary way.
-
- But Bison can force the situation to fit the rule, by discarding
- part of the semantic context and part of the input. First it discards
- states and objects from the stack until it gets back to a state in
- which the `error' token is acceptable. (This means that the
- subexpressions already parsed are discarded, back to the last complete
- `stmnts'.) At this point the `error' token can be shifted. Then, if
- the old look-ahead token is not acceptable to be shifted next, the
- parser reads tokens and discards them until it finds a token which is
- acceptable. In this example, Bison reads and discards input until the
- next newline so that the fourth rule can apply.
-
- The choice of error rules in the grammar is a choice of strategies
- for error recovery. A simple and useful strategy is simply to skip
- the rest of the current input line or current statement if an error is
- detected:
-
- stmnt: error ';' /* on error, skip until ';' is read */
-
- It is also useful to recover to the matching close-delimiter of an
- opening-delimiter that has already been parsed. Otherwise the
- close-delimiter will probably appear to be unmatched, and generate
- another, spurious error message:
-
- primary: '(' expr ')'
- | '(' error ')'
- ...
- ;
-
- Error recovery strategies are necessarily guesses. When they guess
- wrong, one syntax error often leads to another. In the above example,
- the error recovery rule guesses that an error is due to bad input
- within one `stmnt'. Suppose that instead a spurious semicolon is
- inserted in the middle of a valid `stmnt'. After the error recovery
- rule recovers from the first error, another syntax error will be found
- straightaway, since the text following the spurious semicolon is also
- an invalid `stmnt'.
-
- To prevent an outpouring of error messages, the parser will output
- no error message for another syntax error that happens shortly after
- the first; only after three consecutive input tokens have been
- successfully shifted will error messages resume.
-
- Note that rules which accept the `error' token may have actions,
- just as any other rules can.
-
- You can make error messages resume immediately by using the macro
- `yyerrok' in an action. If you do this in the error rule's action, no
- error messages will be suppressed. This macro requires no arguments;
- `yyerrok;' is a valid C statement.
-
- The previous look-ahead token is reanalyzed immediately after an
- error. If this is unacceptable, then the macro `yyclearin' may be
- used to clear this token. Write the statement `yyclearin;' in the
- error rule's action.
-
- For example, suppose that on a parse error, an error handling
- routine is called that advances the input stream to some point where
- parsing should once again commence. The next symbol returned by the
- lexical scanner is probably correct. The previous look-ahead token
- ought to be discarded with `yyclearin;'.
-
- The macro `YYRECOVERING' stands for an expression that has the
- value 1 when the parser is recovering from a syntax error, and 0 the
- rest of the time. A value of 1 indicates that error messages are
- currently suppressed for new syntax errors.
-
- File: bison.info, Node: Context Dependency, Next: Debugging, Prev: Error Recovery, Up: Top
-
- Handling Context Dependencies
- *****************************
-
- The Bison paradigm is to parse tokens first, then group them into
- larger syntactic units. In many languages, the meaning of a token is
- affected by its context. Although this violates the Bison paradigm,
- certain techniques (known as "kludges") may enable you to write Bison
- parsers for such languages.
-
- * Menu:
-
- * Semantic Tokens:: Token parsing can depend on the semantic context.
- * Lexical Tie-ins:: Token parsing can depend on the syntactic context.
- * Tie-in Recovery:: Lexical tie-ins have implications for how
- error recovery rules must be written.
-
- (Actually, "kludge" means any technique that gets its job done but
- is neither clean nor robust.)
-